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Abstract  The  main  characteristics  of  the  significant  wave 
height  in  an  area  of  increased  interest,  the  north  Atlantic 
ocean,  are  studied  based  on  satellite  records  and  corre¬ 
sponding  simulations  obtained  from  the  numerical  wave 
prediction  model  WAM.  The  two  data  sets  are  analyzed  by 
means  of  a  variety  of  statistical  measures  mainly  focusing 
on  the  distributions  that  they  form.  Moreover,  new  tech¬ 
niques  for  the  estimation  and  minimization  of  the  discrep¬ 
ancies  between  the  observed  and  modeled  values  are 
proposed  based  on  ideas  and  methodologies  from  a  rela¬ 
tively  new  branch  of  mathematics,  information  geometry. 
The  results  obtained  prove  that  the  modeled  values  over¬ 
estimate  the  corresponding  observations  through  the  whole 
study  period.  On  the  other  hand,  2-parameter  Weibull  dis¬ 
tributions  fit  well  the  data  in  the  study.  However,  one  cannot 
use  the  same  probability  density  function  for  describing  the 
whole  study  area  since  the  corresponding  scale  and  shape 
parameters  deviate  significantly  for  points  belonging  to 
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different  regions.  This  variation  should  be  taken  into 
account  in  optimization  or  assimilation  procedures,  which  is 
possible  by  means  of  information  geometry  techniques. 

Keywords  Numerical  wave  prediction  models  • 
Distribution  of  significant  wave  height  •  Radar  altimetry  • 
Information  geometry  •  Fisher  information  metric 

1  Introduction 

In  a  demanding  scientific  and  operational  environment,  the 
validity  of  high  quality  sea  state  information  is  constantly 
increasing.  This  is  in  direct  correspondence  with  the  sig¬ 
nificant  number  of  applications  that  are  affected:  climate 
change,  transportation,  marine  pollution,  wave  energy 
production  and  ship  safety  can  be  listed  among  them. 

One  of  the  most  credible  approaches  towards  accurate 
sea  state  forecasting  products  is  the  use  of  numerical  wave 
prediction  systems  in  combination  with  atmospheric  mod¬ 
els  (see,  e.g.,  WAMDIG  1988;  Lionello  et  al.  1992;  Komen 
et  al.  1994;  Chu  and  Cheng  2008).  Such  systems  have  been 
proved  successful  for  the  simulation  of  the  general  sea  state 
conditions  on  global  or  intermediate  scale.  However,  when 
focusing  on  local  characteristics  usually  systematic  errors 
appear  (see  Janssen  et  al.  1987;  Chu  et  al.  2004;  Chu  and 
Cheng  2007;  Makarynskyy  2004,  2005;  Greenslade  and 
Young  2005;  Galanis  et  al.  2006,  2009;  Emmanouil  et  al. 
2007).  This  is  a  multi-parametric  problem  in  which  several 
different  issues  are  involved:  The  strong  dependence  of 
wave  models  on  the  corresponding  wind  input,  the  inability 
to  capture  sub- scale  phenomena,  the  parametrization  of 
certain  wave  properties  especially  in  areas  with  compli¬ 
cated  coastal  formation  where  overshadowing  and  inaccu¬ 
rate  refraction  wave  features  emerge,  as  well  as  the  lack  of 
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a  dense  observation  network  which,  as  in  the  case  of  atmo¬ 
spheric  parameters  over  land,  could  help  on  the  systematic 
correction  of  initial  conditions.  The  latter  increases  the  added 
value  of  satellite  records  for  ocean  wave  parameters. 

Within  this  framework,  there  are  two  main  ways  that  the 
research  community  followed  over  the  last  few  years  in 
order  to  minimize  the  effects  of  the  above  mentioned  dif¬ 
ficulties:  Assimilating  available  observations  in  order  to 
improve  the  initial  conditions  (Janssen  et  al.  1987;  Breivik 
and  Reistad  1994;  Lionello  et  al.  1992,  1995;  Abdalla  et  al. 
2005;  Emmanouil  et  al.  2007)  and  optimization  of  the 
direct  model  outputs  by  using  statistical  techniques  like 
artificial  neural  networks  (Makarynskyy  2004,  2005),  MOS 
methods,  Kalman  filters,  etc.  (Kalman  1960;  Kalman  and 
Bucy  1961;  Rao  et  al.  1997;  Galanis  and  Anadranistakis 
2002;  Kalnay  2002;  Galanis  et  al.  2006,  2009). 

In  both  cases  the  main  idea  is  the  minimization  of  a  “cost- 
function”  that  governs  the  evolution  of  the  error.  Similar 
approaches  are  also  adopted  in  purely  statistical  models  used 
for  the  estimation  of  wave  height  (see,  for  example,  Vanem 
2011;  Vanem  et  al.  2011).  At  this  point  a  critical  simplifica¬ 
tion  is  usually  made:  The  “distance”  between  observed  and 
modeled  values  or  distributions  is  measured  by  means  of 
classical  Euclidean  geometry  tools — using,  for  example,  least 
square  methods.  This  is,  however,  not  always  correct.  Recent 
advances,  in  particular  the  rapid  development  of  information 
geometry,  suggest  that  the  distributions  are  elements  of  more 
complicated  structures,  non  Euclidean  in  general.  More  pre¬ 
cisely,  distributions  of  the  same  type  form  a  manifold,  which  is 
the  generalization  of  a  Euclidean  space  and  in  which  the 
underlying  geometry  may  differ  significantly  from  the  clas¬ 
sical  one  (see  Amari  1985;  Amari  and  Nagaoka  2000;  Arwini 
and  Dodson  2007,  2008).  The  exact  knowledge  of  the 
framework  in  which  the  data  sets  or  distributions  under  con¬ 
sideration  are  classified  may  give  more  accurate  criteria  and 
procedures  for  the  optimization  of  the  final  results. 

The  purpose  of  the  present  work  is  twofold:  At  first,  the 
sea  state  characteristics  in  the  north  Atlantic  ocean  are 
analyzed  by  means  of  a  variety  of  statistical  indices.  Spe¬ 
cial  attention  is  given  to  the  probability  distribution  func¬ 
tion  of  the  significant  wave  height  (the  average  height  of 
the  highest  one-third  waves  in  a  wave  spectrum).  In  a 
second  step,  the  derived  statistical  information  is  utilized 
for  the  estimation  of  possible  biases  in  numerical  wave 
predictions  based  on  novel  techniques  provided  in  the 
framework  of  information  geometry. 

For  the  above  purposes  simulated  wave  data  obtained 
from  the  state-of-the  art  numerical  WAve  prediction  Model 
(WAM)  (Komen  et  al.  1994;  WAMDIG  1988;  Jansen 
2000,  Bidlot  and  Janssen  2003)  and  corresponding  records 
from  all  the  available  satellites  covering  the  study  area 
(Radar  Altimetry  Tutorial  project,  Rosmorduc  et  al.  2009) 
are  employed.  The  distributions  that  the  two  data  sets  form 


are  recovered  based  on  different  statistical  tests,  and  inter¬ 
comparisons  are  attempted. 

An  application  of  the  proposed  methodology  is  outlined 
by  focusing  on  a  restricted  area  (northwestern  coastline  of 
France  and  Spain)  avoiding  lumping  data  from  different 
wave  climate  regions.  Alternative  scenarios  for  the  esti¬ 
mation  of  model  biases  are  discussed.  The  results  and  ideas 
presented  in  this  work  could  be  exploited  for  designing  and 
using  new  methods  for  the  optimization  of  the  initial 
conditions  and  the  final  outputs  of  numerical  wave  pre¬ 
diction  systems  since  they  could  support  more  sophisti¬ 
cated  ways  of  realizing  the  corresponding  cost  functions 
taking  into  account  the  geometric  properties  (scale  and 
shape  parameters  for  example)  of  the  space  that  the  data 
under  study  form,  and  avoiding  simplifications  that  the 
classical  pattern  (least  square  methods)  impose. 

The  presented  material  is  organized  as  follows:  In  Sect. 
2  the  wave  model,  the  data  sets  and  the  methodology  used 
are  described.  The  statistical  results  obtained  for  the 
observations  and  the  corresponding  modeled  values  are 
analyzed  in  Sect.  3.  In  particular,  Sect.  3.1  focuses  on  the 
optimum  choice  of  distributions  that  fit  to  the  data  in  the 
study,  while  in  Sect.  3.2  a  detailed  study  of  the  results 
obtained  in  a  restricted  area  (northwestern  coastline  of 
Spain  and  France)  is  presented  based  on  descriptive  sta¬ 
tistics  and  distribution  fitting.  In  Sect.  4  a  new  approach 
dealing  with  the  problem  of  distance  estimation  between 
observations  and  modeled  values  is  proposed  by  using 
techniques  of  information  geometry.  Section  4.1  is  devoted 
to  the  introduction  of  some  general  notions  and  results 
while  in  Sect.  4.2  a  direct  application  to  the  wave  data  in 
the  study  is  attempted.  Finally,  the  main  conclusions  of  this 
work  are  summarized  in  Sect.  5. 


2  Models,  data  sets  and  methodology 

2.1  The  wave  model 

The  model  used  for  wave  simulation  is  WAM  Cycle 
4 — ECMWF  version  (Jansen  2000;  Bidlot  and  Janssen 
2003).  This  is  a  third  generation  wave  model  which  solves 
the  wave  transport  equation  explicitly  without  any  assump¬ 
tions  on  the  shape  of  the  wave  spectrum  (WAMDIG  1988; 
Komen  et  al.  1994).  The  model  was  operated  by  our  group 
(Atmospheric  Modeling  and  Weather  Forecasting  Group, 
University  of  Athens,  http://www.mg.uoa.gr)  in  an  opera¬ 
tional/forecasting  mode  (that  is  using  forecasted  wind  forc¬ 
ing  and  not  reanalysis  data)  for  a  period  of  12  months  (year 
2008)  covering  the  north  Atlantic  ocean  (Fatitude  0°N- 
80°N,  Fongitude  100°W-30°E,  Fig.  1).  The  wave  spectrum 
was  discretized  to  30  frequencies  (range  0.0417-0.54764  Hz 
logarithmically  spaced)  and  24  directions  (equally  spaced). 
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Fig.  1  The  study  area.  The  red 
rectangle  denotes  the  borders  of 
the  restricted  region. 

(Color  figure  online) 
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The  horizontal  resolution  used  was  0.5  x  0.5°  and  the 
propagation  time  step  300  s.  WAM,  ran  on  a  deep  water 
mode  with  no  refraction,  driven  by  6-hourly  wind  input 
(10  m  above  sea  level  winds  speed  and  direction)  obtained 
by  NCEP/GFS  global  model  with  horizontal  grid  resolution 
0.5  x  0.5°.  It  should  be  noted  that  no  assimilation  procedure 
was  employed  since  the  available  satellite  data  are  used  in 
our  study  as  independent  observations  against  which  the 
modeled  values  are  evaluated. 

2.2  The  satellite  data 

The  observation  data  used  in  this  study  are  obtained  from 
the  ESA-CNES  joint  project  Radar  Altimetry  Tutorial 
(Rosmorduc  et  al.  2009).  These  data  contain  near-real  time 
gridded  observations  for  significant  wave  height  obtained  by 
merging  all  available  relevant  satellite  records  from  official 
data  centers:  ERS-1  and  ERS-2  (ESA),  Topex/Poseidon 
(NASA/CNES),  Geosat  Follow-On  (US  Navy),  Jason- 1 
(CNES/NASA),  Envisat  (ESA).  The  system  is  running  daily 
in  an  operational  mode.  Each  run  is  based  on  the  available 
satellite  data  of  the  previous  2  days  from  which  a  merged 
map  is  generated.  The  produced  interpolated  outputs  cover 
the  whole  area  of  study  (0°N-80°N,  100°W-30°E)  at  a  res¬ 
olution  of  1.0  x  1.0°.  Data  are  cross-calibrated  and  quality 
controlled  using  Jason- 1  as  reference  mission.  The  results  are 
improved  in  case  of  additional  mission  availability.  The 
period  covered  is  again  the  whole  year  2008. 

2.3  Statistical  approaches — methodology 

Both  observations  and  wave  modeled  data  are  studied  by 
two  statistical  points  of  view:  The  first  is  based  on 


descriptive  statistical  analysis  methods  where  conventional 
indices  are  employed  in  order  to  capture  the  basic  aspects 
of  the  data  evolution  spatially  and  temporally.  The  second 
approach  is  based  on  the  study  of  the  probability  density 
function  that  fits  to  the  available  data.  This  is  a  comple¬ 
mentary  approach  being  able  to  provide  additional  infor¬ 
mation  for  the  shape  and  scale  of  the  data  in  the  study 
including  possible  impact  of  extreme  values.  In  this  way,  a 
complete  view  of  the  main  characteristics  of  observational 
and  simulated  significant  wave  height  values  is  obtained. 

More  precisely,  the  following  statistical  measures  are 
used: 

•  Mean  value  of  available  data: 

1  N 

Mean  =  n  =  -  ^  SWH(i) 

i=  1 

Here  SWH  denotes  the  recorded  (observed)  or  simulated 
significant  wave  height  value  and  N  the  size  of  the  sample. 

•  Standard  deviation :  g  =  l  ( SWH(i )  —  n)2 

•  Coefficient  of  variation: 


g 


a  normalized  measure  of  the  dispersion. 

•  Skewness : 

(SWH(i)-n)3 

81  (T3 

a  measure  of  the  asymmetry  of  the  probability  distribution. 

•  Kurtosis : 

hTli  (SWH(i)-n)4  ^ 
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that  gives  a  measure  of  the  “peakedness”  of  the  probability 
distribution. 

Additionally,  the  basic  percentiles  (P5,  P10,  P25  =  Qi, 
P50  =  median,  P75  =  Q3,  P90  and  P95)  are  used. 

Apart  from  the  above  descriptive  statistical  approach, 
the  data  in  the  study  have  been  analyzed  by  a  distributional 
point  of  view.  More  precisely,  the  optimum  probability 
density  functions  (pdfs)  that  fit  the  observational  and 
modeled  significant  wave  height  series  are  revealed.  A 
variety  of  pdfs  have  been  tested  (Logistic,  Normal, 
Gamma,  Log-Gamma,  Log-Logistic,  Lognormal,  Weibull, 
Generalized  Logistic)  at  several  levels  of  statistical  sig¬ 
nificance  by  utilizing  different  fitting  tests  (Kolmogorov- 
Smimov,  Anderson-Darling  as  well  as  P-P  and  Q-Q  plots) 
as  well  as  statistical  tools:  Matlab  (http://www.mathworks. 
com/products/matlab/)  and  EasyFit  (http://www.mathwave. 
com/).  The  results  reconfirm  previous  studies  (Nordenstrpm 
1973;  Thornton  and  Guza  1983;  Ferreira  and  Soares  1999, 
2000;  Prevosto  et  al.  2000;  Muraleedharan  et  al.  2007; 
Gonzalez-Marco  et  al.  2008)  proposing  the  Weibull  dis¬ 
tribution  as  a  very  good  choice  for  fitting  significant  wave 
height  data  (see  for  example  Fig.  2).  However,  the  scale 
and  shape  parameters  obtained  vary  spatially  and  tempo¬ 
rarily  (Sect.  3.1). 

Apart  from  the  above-mentioned  “classical”  statistical 
approaches,  one  of  the  main  novelties  proposed  in  this  work 
is  the  utilization  of  non  conventional  statistical  techniques 
obtained  from  a  relatively  new  branch  of  Mathematics,  the 
information  geometry.  This  approach,  discussed  in  detail  in 
Sect.  4,  allows  the  accurate  description  of  the  space  to  which 
the  results  under  study  belong  and,  based  on  the  corre¬ 
sponding  geometric  properties,  the  better  estimation  of 
possible  biasses.  In  this  way,  one  avoids  a  classical 


simplification  adopted  in  conventional  statistics:  the  calcu¬ 
lation  of  distances  based  on  Euclidean  measures. 


3  Results  and  statistics 

3.1  Probability  density  Function  fitting 

The  data  obtained  for  the  significant  wave  height  in  the 
north  Atlantic  ocean,  as  simulated  by  the  wave  model 
(Sect.  2.1)  and  recorded  by  the  Radar  Altimetry  Tool  (Sect. 
2.2),  are  studied  here  focusing  on  the  distributions  that  they 
form.  The  use  of  all  the  statistical  fitting  tests  mentioned 
earlier  verified  that,  in  most  of  the  cases,  the  two-parameter 
Weibull  distribution: 


where  a  is  the  shape  and  /i  the  scale  parameter,  fits  well  to 
the  wave  data  at  a  statistical  significance  level  of  0.05  or 
higher.  An  example  is  presented  in  Fig.  2.  However,  dif¬ 
ferent  parameters  are  obtained  for  the  pdfs  of  satellite 
records  and  WAM  values.  On  the  other  hand,  a  non- trivial 
spatial  variability  is  revealed. 

It  should  be  noticed  that  the  3 -parameter  Weibull  dis¬ 
tribution  fits  also  to  the  data  in  the  study  but  with  trivial 
differences  from  the  2-parameter  case.  Since  an  additional 
parameter  would  result  in  far  more  technical  calculations  in 
the  proposed  information  geometry  methodology  without 
providing  essential  improvement  of  the  obtained  tech¬ 
niques,  the  2-parameter  Weibull  has  been  adopted. 

The  data  sets  were  partitioned  into  3 -monthly  intervals 
(December-February,  March-May,  June-August  and 


Fig.  2  Fitting  of  the 
2-parameter  Weibull 
distribution  to  the  WAM 
modeled  significant  wave  height 
data  for  May  2008 
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September-November)  in  order  to  have  a  clearer  view  of 
the  seasonal  variability  of  the  sea  state.  In  Figs.  3,  4,  5,  and 
6  the  shape  parameter  of  the  obtained  Weibull  distribution 
fitted  to  the  satellite  data  is  plotted  over  the  whole  area  of 
interest  while  Figs.  7,  8,  9,  and  10  contain  the  corre¬ 
sponding  values  for  the  WAM  outputs.  It  is  worth  under¬ 
lining  here  that  in  both  cases  the  values  estimated  are 
clearly  increasing  towards  offshore  areas.  In  particular,  the 
maximum  values  emerged  at  the  region  southeast  of 
Greenland  and  south  of  Iceland  reaching  values  of  6.5 
during  the  winter  period  (Figs.  3,  7).  For  the  rest  of  the 
period,  the  same  area  keeps  the  maximum  estimated  values 
which,  however,  are  significantly  decreased.  It  is  also 
noticeable  that  the  estimated  shape  parameters  for  WAM 


outputs  are  elevated  compared  to  those  of  satellite  records 
in  a  relatively  mild  but  systematic  way. 

The  Weibull  scale  parameter  values  are  presented  in 
Figs.  11,  12,  13,  and  14  for  satellite  records  and  Figs.  15, 
16,  17,  and  18  for  their  WAM  counterparts.  The  wave 
model  in  this  case  seems  to  yield,  in  general,  underesti¬ 
mated  values.  On  the  other  hand,  the  increased  values  at  the 
southern  part  of  the  domain,  especially  during  summer 
months,  can  be  partially  attributed  to  the  non  uniform 
distribution  of  wave  heights  in  this  area. 

It  is  important  to  underline  at  this  point  that  the  signif¬ 
icant  spatial  variation  of  both  shape  and  scale  parameters, 
revealed  in  all  the  above  cases,  indicates  that  considering 
uniform  ways  of  studying  or  correcting  wave  heights  over 


Fig.  3  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
December-February 
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Fig.  4  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
March-May 
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Fig.  5  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
June-August 
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Fig.  6  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
September-November 
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the  whole  Atlantic  ocean  is  an  assumption  of  increased 
risk. 

3.2  Focusing  on  a  restricted  area 

In  this  section,  the  attention  is  focused  on  a  restricted  area 
of  increased  interest  due  to  several  activities  raised  recently 
concerning  mainly  wave  energy  applications:  the  northwest 
coastline  of  France  and  Spain  (inner  rectangle  in  Fig.  1). 
Indeed,  several  European  and  national  projects  require  the 
exact  knowledge  of  the  local  wave  climate  as  well  as  the 
accurate  sea  state  prediction  in  order  to  estimate  the 
available  energy  potential. 

The  sea  wave  characteristics  are  studied  here  by  two 
different  points  of  view:  Descriptive  statistical  measures, 


giving  the  main  information  for  the  data  in  the  study,  as 
well  as  distribution  fitting  in  order  to  categorize  them  in  a 
more  uniform  way,  appropriate  for  the  new  techniques 
proposed  in  this  work. 

In  Table  1  the  main  descriptive  statistical  indices,  as 
described  in  Sect.  2.3,  are  presented  in  monthly  intervals 
for  the  available  satellite  data.  The  time  period  covered  is 
again  the  year  2008  and  the  sample  size  exceeds  2  million 
values.  The  corresponding  results  for  the  whole  time  period 
as  well  as  divided  in  “Summer”  (April-September)  and 
“Winter”  months  (October-March)  can  be  found  in 
Table  2.  The  first  conclusions  are  rather  expected:  The 
range  of  the  observations  as  well  as  their  mean  value  and 
variability  are  higher  during  winter.  Furthermore,  the 
increased  kurtosis  during  March  and  May  reveals  that  a 
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Fig.  7  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  December-February 


Atmospheric  Modeling  Group  -  University  of  Athens 

Shape  parameter 


- 

-100.  -90.  -80.  —70.  -60.  -50.  -40.  -30.  -20.  -10.  0.  10.  20.  30. 


J  I  I 


0.50  1  00  1.50  2.00  3.50  3  .00  3 . 50  4.00  4  50  5.00  5.50  6.00  6.50  7. 00  7.50  8  0 


Fig.  8  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  March-May 
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significant  part  of  the  variability  is  related  to  non  frequent 
outliers.  The  percentiles  of  the  satellite  records  are  pre¬ 
sented  in  Tables  3  and  4. 

The  corresponding  statistics  for  WAM  outputs  are  pre¬ 
sented  in  Tables  5,  6,  7,  and  8.  The  basic  descriptive  sta¬ 
tistical  measures  can  be  found  in  Tables  5  and  6  while  the 
corresponding  percentiles  are  presented  in  Tables  7  and  8. 
The  same  results  are  graphically  represented  in  Figs.  19, 
20,  21,  and  22. 

Interesting  conclusions  can  be  stated  here  for  the  accuracy 
of  the  numerical  wave  model  WAM  in  an  open  sea  area: 

•  WAM  slightly,  but  constantly,  overestimates  wave 
heights  through  the  whole  study  period  (Fig.  19).  The 
time  independence  of  this  divergence  is  worth 
mentioning. 


•  The  variability  of  both  observations  and  modeled 
values  is  increased  during  winter,  something  expected 
due  to  the  unstable  weather  conditions.  What  needs  to 
be  mentioned  is  the  consistently,  again,  higher  values  of 
the  standard  deviation  of  WAM  (Fig.  20). 

•  Significant  discrepancies  exist  between  the  ranges  of 
the  wave  height  results  in  the  two  sets  (WAM 
simulations  and  satellite  observations).  This  can  be,  at 
least  partly,  attributed  to  the  fact  that  the  observation 
data  set  is  obtained  by  merging  different  satellite 
measurements,  a  procedure  that  always  includes  some 
smoothness  of  the  final  results  due  to  interpolation.  On 
the  other  hand,  the  well  known  difficulties  of  WAM  on 
successfully  simulating  the  swell  decay  (WISE  Group 
2007)  contribute  also  to  this  problem. 
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Fig.  9  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  June-August 
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Fig.  10  The  shape  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  September-November 
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•  The  relatively  higher  values  of  the  corresponding 
percentiles  as  well  as  the  monotonic  increased  distances 
between  them  (Tables  3,  4,  7,  8)  confirm  the  overesti¬ 
mation  of  the  data  by  WAM  simulations  and  the  non 
negligible  influence  of  extreme  values  to  their  distribu¬ 
tion.  Although  the  purpose  of  this  work  is  not  to 
concentrate  on  problems  of  the  wind/wave  models  that 
may  lead  to  such  deviations,  it  should  be  noted  that  the 
latter  are  closely  related  to  the  wind  input  used  (atmo¬ 
spheric  models  discrepancies).  On  the  other  hand,  the 
inclusion  of  current  in  wave  forecasting  is  still  lacking  in 
WAM,  while  problems  with  the  accurate  simulation  of 
the  swell  waves  and  especially  their  decay,  as  already 
mentioned  earlier,  also  contribute  to  these  discrepancies. 


It  is  worth  noticing  at  this  point  that  when  wind  sea  and 
swell  components  are  considered,  a  spectral  partitioning 
adopted  will  affect  the  accuracy  of  wind  sea  and  swell 
statistics.  The  Hanson  and  Phillips  formulation  (devel¬ 
oped  by  the  Applied  Physics  Department  of  Johns 
Hopkins  University,  2001)  for  labeling  wind  sea  and 
swell  is  commonly  applied.  The  main  drawback  of  this 
approach  is  related  to  fully  developed  wind  seas  with  a 
small  wind  decay  but  still  in  the  same  direction  of  the 
wave  field,  as  shown  by  Quentin  (2002),  and  later  by 
Loffredo  et  al.  (2009);  if  the  new  condition  cannot  satisfy 
the  formulation  adopted  by  Hanson  and  Phillips,  the  old 
wind  sea  will  be  treated  as  swell  and  the  new  wind  sea  set 
to  zero.  Further,  as  documented  in  Loffredo  et  al.  (2009), 


Springer 


Stoch  Environ  Res  Risk  Assess 


Fig.  11  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
December-February 
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Fig.  12  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
March-May 
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the  Hanson  and  Phillips  formulation  for  labeling  wind 
sea  and  swell  may  increase  the  number  of  wind  seas  as 
compared  to  other  commonly  used  approaches  for 
partitioning  of  wind  sea  and  swell. 

•  Skewness  is  increased  in  WAM  outputs  compared  to 
the  observations  (Fig.  21).  This  higher  positive  asym¬ 
metry  indicates  that  a  non-negligible  portion  of  the 
modeled  significant  wave  height  is  concentrated  to 
relatively  smaller  values  something  that  is  less  obvious 
in  the  corresponding  observations. 

•  Elevated  kurtosis  for  WAM  outputs  can  be  attributed  to 
the  increased  influence  of  extreme  values.  This  situa¬ 
tion  is  more  obvious  during  March  and  the  summer 
months  (Fig.  22). 


Studying  now  the  same  data  from  a  distribution  fitting 
point  of  view,  following  the  methodology  discussed  in 
Sect.  3.1,  the  following  points  may  be  emphasized: 

•  The  2-parameter  Weibull  distribution  seems  to  fit  well  to 
the  data  in  the  study  both  for  WAM  and  observed  values. 

•  The  shape  parameter  (a)  both  for  the  recorded  and 
simulated  values  of  SWH  seems  to  deviate  from  the 
case  of  Rayleigh  distribution  (Tables  9,  10,  11,  12; 
Fig.  23)  where  a  =  2.  The  latter  was  the  pdf  proposed 
in  previous  works  (e.g.,  Muraleedharan  et  al.  2007) 
indicating  that  the  use  of  the  general  2-parameter 
Weibull  probability  density  function  is  more 
appropriate. 
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Fig.  13  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
June-August 
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Fig.  14  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  significant  wave  height 
satellite  data  over  the  north 
Atlantic  ocean  for  the  months 
September-November 
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•  The  increased  values  of  the  scale  parameter  (/?)  for 
WAM  (Fig.  24)  reconfirms  the  overestimation  of 
modeled  values  as  already  noticed  based  on  the 
descriptive  statistical  measures.  Moreover,  the  values 
of  for  both  cases  follow  the  pattern  of  the  mean 
values  being  reduced  during  summer  months. 

•  The  discrepancies  between  the  parameters  of  the 
Weibull  distributions  obtained  for  satellite  records 
and  modeled  wave  height  values  are  not  major. 
Therefore,  the  techniques  described  in  Sect.  4.2.1  for 
estimating  the  distance  between  WAM  outputs  and  the 
corresponding  observations  can  be  exploited. 


4  Estimation  of  the  distance  between  observations 
and  simulated  values  using  information  geometrical 
techniques 

In  the  previous  sections  special  attention  was  given  on  the 
main  statistical  characteristics  as  well  as  the  distributions 
formed  by  WAM  values  and  the  corresponding  satellite 
records  for  the  area  of  the  north  Atlantic  ocean.  The 
obtained  results  reveal  non  negligible  differences  between 
the  two  data  sets  that  should  be  taken  into  consideration  in 
order  to  optimize  the  accuracy  of  the  wave  model.  Some 
new  ideas  towards  this  direction  based  on  information 
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Fig.  15  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  December-February 
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Fig.  16  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  March-May 
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geometry  (IG)  techniques  are  discussed  in  the  present 
work.  More  precisely,  having  already  defined  the  best-fit- 
ting  distributions  to  the  data  in  the  study,  a  detailed 
description  of  the  space  that  they  form  is  attempted,  the 
corresponding  geometric  entities  are  investigated  and  new 
techniques  are  proposed  for  the  accurate  estimation  of  the 
distance  between  observations  and  modeled  values. 

4.1  Basic  information  geometric  concepts 

In  order  to  make  this  work  as  self-contained  as  possible,  a 
short  presentation  of  the  main  notions  and  terminology  of 
information  geometric  techniques  needed  here  follows. 


More  details  and  results  can  be  found  in  Amari  1985; 
Amari  and  Nagaoka  2000;  Arwini  and  Dodson  2007,  2008. 

Information  geometry  is  a  relatively  new  branch  of 
mathematics  in  which  the  main  idea  is  to  apply  methods 
and  techniques  of  non-Euclidean  geometry  to  probability 
theory  and  stochastic  processes.  In  particular,  information 
geometry  realizes  a  smoothly  parametrized  family  of 
probability  distributions  as  a  manifold  on  which  geomet¬ 
rical  entities  such  as  Riemannian  metrics,  distances,  cur¬ 
vature  and  affine  connections  can  be  introduced.  To  be 
more  precise,  a  family  of  probability  distributions 

s={pz=p(x-,m  =  [Zut2,..,Zn\eZ}  (i) 
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Fig.  17  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  June-August 
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Fig.  18  The  scale  parameter  of 
the  Weibull  distributions  that  fit 
to  the  WAM  modeled 
significant  wave  height  over  the 
north  Atlantic  ocean  for  the 
months  September-November 
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Table  1  The  main  statistical  parameters  for  satellite  data  in  the  restricted  area  per  month 


Statistical  parameter 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

Range 

6.25 

5.75 

8.23 

4.36 

3.02 

3.33 

3.03 

4.47 

4.69 

4.40 

6.21 

6.72 

Mean 

3.66 

2.70 

3.49 

2.33 

1.46 

1.50 

1.70 

2.07 

2.07 

2.56 

2.73 

3.22 

Std.  deviation 

1.16 

1.06 

1.42 

0.79 

0.53 

0.50 

0.61 

0.78 

0.92 

0.79 

1.25 

1.15 

Coef.  of  variation 

0.32 

0.39 

0.41 

0.34 

0.37 

0.33 

0.36 

0.38 

0.45 

0.31 

0.46 

0.36 

Skewness 

0.24 

0.49 

1.14 

0.44 

1.15 

0.83 

0.84 

1.06 

0.82 

0.55 

0.83 

0.75 

Kurtosis 

-0.31 

-0.57 

1.46 

-0.30 

1.47 

0.78 

0.04 

0.70 

0.28 

-0.13 

-0.01 

0.46 

where  each  element  may  be  parametrized  using  the  n  real 
valued  variables  [£1?  £2,  •  • £n\  in  an  °Pen  subset  S  of  Rn 
while  the  mapping  is  injective  and  smooth,  is  called  a 


n-dimensional  statistical  manifold.  The  geometrical  entities 
in  a  statistical  manifold  are  dependent  on  the  Fisher 
information  matrix  which  at  a  point  £  is  a  n  x  n  matrix 
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Table  2  The  main  statistical  parameters  for  satellite  data  in  the 
restricted  area  summarized  for  the  whole  study  period,  the  summer 
and  winter  months 


Statistical  parameter 

Overall 

Summer 

Winter 

Range 

5.04 

3.82 

6.26 

Mean 

2.46 

1.86 

3.06 

Std.  deviation 

0.91 

0.69 

1.14 

Coef.  of  variation 

0.37 

0.37 

0.37 

Skewness 

0.76 

0.86 

0.66 

Kurtosis 

0.32 

0.49 

0.15 

G(«  =  M9],  (2) 

defined  by 

gij(0  =  Exli[dAx-,  00/(*;  £)] 

r  (3) 

=  J  S i£(x-  £)S/(x;  g)p(x;  £)dx,  ij  =  1,2 

Here  0,-  stands  for  the  partial  derivative  with  respect  to  the 
i-th  factor,  £  is  the  log-likelihood  function: 

i(x\  £)  =  ^(x)  =  log^(x;  £)]  (4) 

and 

Ex\df]  =  J  f(x)p(x ;  £)<&  (5) 

denotes  the  expectation  with  respect  to  the  distribution  p. 

The  matrix  G(£)  is  always  symmetric  and  positive  semi- 
definite  (Amari  and  Nagaoka  2000).  If,  in  addition,  G(£)  is 
positive  definite,  then  a  Riemannian  metric  (see  Spivak 
1965,  1979;  Dodson  and  Poston  1991)  can  be  defined  on 
the  statistical  manifold  corresponding  to  the  inner  product 
induced  by  the  Fisher  information  matrix  on  the  natural 
basis  of  the  coordinate  system  [£*•]: 

Sij  =  <0i|0y)-  (6) 

This  Riemannian  metric  is  called  the  Fisher  metric  or  the 
information  metric.  The  corresponding  geometric  properties 
of  this  framework  are  characterized  by  the  so-called 

Christoffel  symbols  (  Tjk  J  defined  by  the  relations: 


Table  4  Percentiles  for  satellite  data  in  the  restricted  area  for  the 
whole  study  period,  the  summer  and  winter  months 


Percentile 

Overall 

Summer 

Winter 

p5 

0.62 

0.44 

0.81 

P10 

1.24 

0.96 

1.52 

P25  =  Ql 

1.43 

1.11 

1.75 

P50  (median) 

1.77 

1.35 

2.20 

P75  =  Q3 

2.30 

1.73 

2.88 

P90 

3.00 

2.21 

3.80 

P95 

3.75 

2.89 

4.62 

Fjk,h{0  = 

^0/0* M  +  -0/^0^ 

(7) 

i,  j  ,h=  1,2 

r ik,h  =  jZshiT)k{h  =  1,2). 

(8) 

i=  1 


The  minimum  distance  between  two  elements  f\  and/2  of  a 
statistical  manifold  S  is  defined  by  the  corresponding 
geodesic  co  which  is  the  minimum  length  curve  that 
connects  them.  Such  a  curve 

co  =  (coj)  :  M  — >  S  (9) 

satisfies  the  following  system  of  2nd  order  differential 
equations: 

n 

co'V)  +  =0,  *  =  1,2,..  ,n.  (10) 

j,k=  1 

under  the  conditions  co( 0)  =/i,co(l)  =/2. 

It  worth  noticing  that  information  geometric  techniques 
have  been,  directly  or  not,  tested  on  different  applications. 
Iguzquiza  and  Chica-Olmo  (2008),  for  example,  utilized 
the  Fisher  information  matrix  for  geostatistical  simulations 
for  restricted  samples.  On  the  other  hand,  Cai  et  al.  (2002) 
applied  information  theoretic  analysis  on  self-clustering  of 
amino  acids  along  protein  chains.  Resconi  (2009)  is  also 
based  on  non-Euclidean  geometric  tools  for  a  risk  analysis 
study.  However,  to  the  author’s  knowledge,  the  current 
work  is  the  first  try  to  apply  such  tools  on  meteorology/ 
oceanography. 


Table  3  Percentiles  for  satellite  data  in  the  restricted  area  per  month 


Percentile 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

p5 

1.89 

1.28 

1.67 

1.21 

0.80 

0.82 

0.92 

1.15 

0.84 

1.49 

1.17 

1.64 

P10 

2.13 

1.47 

1.98 

1.43 

0.89 

0.95 

1.06 

1.29 

1.02 

1.63 

1.37 

1.92 

P25  =  Q1 

2.74 

1.86 

2.52 

1.76 

1.08 

1.14 

1.27 

1.50 

1.34 

1.94 

1.74 

2.41 

P50  (median) 

3.71 

2.54 

3.12 

2.22 

1.35 

1.42 

1.55 

1.89 

1.92 

2.48 

2.39 

3.02 

P75  =  Q3 

4.46 

3.49 

4.24 

2.82 

1.69 

1.79 

1.98 

2.41 

2.55 

3.08 

3.58 

3.95 

P90 

5.08 

4.23 

5.33 

3.52 

2.21 

2.19 

2.71 

3.34 

3.38 

3.63 

4.61 

4.83 

P95 

5.56 

4.63 

6.37 

3.83 

2.51 

2.38 

2.97 

3.70 

3.92 

4.03 

5.07 

5.37 
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Table  5  The  main  statistical  parameters  for  WAM  outputs  in  the  restricted  area  per  month 


Statistical  parameter 

Jan 

Feb 

Mar 

Apr  May 

Jun  Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

Range 

11.28 

8.69 

18.27 

7.09  5.55 

6.35  9.11 

8.59 

7.64 

7.47 

9.26 

11.06 

Mean 

4.06 

3.13 

3.99 

2.54  1.66 

1.74  2.00 

2.11 

2.28 

2.74 

2.92 

3.57 

Std.  deviation 

1.50 

1.24 

1.99 

1.07  0.56 

0.63  0.85 

1.05 

1.09 

1.10 

1.47 

1.53 

Coef.  of  variation 

0.37 

0.40 

0.50 

0.42  0.34 

0.36  0.43 

0.50 

0.48 

0.40 

0.50 

0.43 

Skewness 

0.82 

0.79 

1.92 

0.75  1.24 

1.14  1.77 

1.96 

1.30 

0.66 

1.19 

1.11 

Kurtosis 

1.07 

0.50 

6.61 

0.68  4.17 

2.57  5.95 

5.21 

2.38 

0.44 

1.52 

1.90 

Table  6  The  main  statistical  parameters  for  WAM  outputs  in  the 

Table  8  Percentiles  for  WAM  outputs 

in  the  restricted  area 

for  the 

restricted  area  summarized  for  the  whole  study  period,  the  summer 

whole  study  period,  the  summer 

and  winter  months 

and  winter  months 

Percentile 

Overall 

Summer 

Winter 

Statistical  parameter 

Overall 

Summer 

Winter 

ps 

1.23 

0.98 

1.48 

Range 

9.20 

7.39 

11.01 

p10 

1.48 

1.16 

1.80 

Mean 

2.73 

2.06 

3.40 

P25  =  Ql 

1.92 

1.46 

2.37 

Std.  deviation 

1.17 

0.88 

1.47 

P5o  (median) 

2.52 

1.88 

3.15 

Coef.  of  variation 

0.43 

0.42 

0.43 

P75  =  Q3 

3.31 

2.48 

4.15 

Skewness 

1.22 

1.36 

1.08 

P90 

4.24 

3.17 

5.32 

Kurtosis 

2.75 

3.49 

2.01 

P95 

4.95 

3.74 

6.16 

4.2  Application  to  WAM  outputs  and  satellite  data 

/  \ 

a— 1 

The  significant  wave 

height  data  obtained  in  the  present  study, 

*-G) 

(ii) 

both  from  satellite  records  and  WAM  model, 

have  been 

proved  in  Sect.  3.1  to  follow  2-parameter  Weibull  distribu¬ 
tions.  The  corresponding  parameters  however  seem  to  differ 
between  the  two  data  sets  and  to  fluctuate  in  time  and  space. 

In  this  section  different  scenarios  will  be  discussed, 
based  on  information  geometric  techniques,  concerning  the 
optimum  way  of  estimating  the  distance  between  the  two 
data  sets.  The  obtained  results  can  be  exploited  in  assimi¬ 
lation  or  optimization  procedures  for  better  defining  the 
involving  cost  functions  targeting  at  the  improvement  of 
the  final  modeled  products. 

Following  the  formalism  presented  in  Sect.  4.1,  the 
family  of  the  two  parameter  Weibull  distributions  can  be 
considered  as  a  2-dimensional  statistical  manifold  with 
£  —  [a,  /?],  S  =  {[a,  /?];  a  and  /?  >  0}  and 


The  log-likelihood  function  becomes: 

i(x\  £))  =  log  I p(x-  £)] 

=  log  a  -  logj3+  (a  -  l)(logx  -  log  /?)  -  (^- 

(12) 

while  the  Fisher  information  matrix  (Amari  1985;  Amari 
and  Nagaoka  2000)  takes  the  form: 


G(  <*,P)  = 


•2«2 


or/?' 

m  -y) 


/?(1  -7) 

-l)2- 

6a2 


(13) 


Here  y  =  lim „_>+00(^=1  \/k  —  Inn)  =  0.577215  is  the 
Euler  Gamma.  The  Christoffel  symbols  of  the  0-connection 


Table  7  Percentiles  for  WAM  outputs  in  the  restricted  area  per  month 


Percentile 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

p5 

1.97 

1.45 

1.64 

1.04 

0.87 

0.93 

1.04 

1.05 

0.92 

1.14 

1.16 

1.49 

P10 

2.31 

1.75 

2.02 

1.30 

1.03 

1.08 

1.20 

1.19 

1.13 

1.46 

1.37 

1.91 

P25  =  Q1 

2.94 

2.25 

2.68 

1.79 

1.30 

1.29 

1.44 

1.44 

1.51 

1.96 

1.84 

2.57 

P50  (median) 

3.89 

2.90 

3.57 

2.38 

1.62 

1.63 

1.78 

1.80 

2.08 

2.59 

2.59 

3.35 

P75  =  Q3 

4.93 

3.81 

4.88 

3.20 

1.93 

2.07 

2.38 

2.48 

2.79 

3.41 

3.64 

4.25 

P90 

6.01 

4.92 

6.25 

3.96 

2.27 

2.56 

3.14 

3.38 

3.69 

4.23 

4.98 

5.55 

P95 

6.76 

5.55 

7.36 

4.54 

2.57 

2.91 

3.65 

4.34 

4.40 

4.75 

5.88 

6.67 
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Fig.  19  The  evolution  of  mean 
value  for  WAM  modeled  and 
satellite  recorded  significant 
wave  height  in  the  restricted 
region  through  the  whole  study 
period 


Fig.  20  The  evolution  of 
standard  deviation  for  WAM 
modeled  and  satellite  recorded 
significant  wave  height  in  the 
restricted  region  through  the 
whole  study  period 


(see  Amari  and  Nagaoka  2000;  Arwini  and  Dodson  2007, 
2008)  in  this  case  are: 


!  6 (ycc-a-fj 


ri<  = 


r?,= 


— or 


r1  —  r1  _ 

1  21  ~  1  12  — 


n2P  n2p2 

j  _6(y2-2y  +  £  +  l) 


r2  —  r2  — 

1  21  —  1  12  — 


6a(l  —  y) 

7l2f 


r1  — 

1  22  ~ 


r-  — 
1  22  — 


6(1  -y)p(y2  -2y  +  f  +  l) 

7i2a3 

’(y2-2y  +  f +l) 


(14) 


The  main-general  question  that  is  raised  is: 

With  the  Weibull  parameters  a  and  jS  known,  which  is 
the  optimum  way  of  estimating  the  distance  between 
observations  and  WAM  outputs? 

Two  scenarios  are  proposed. 


4.2.1  Working  for  points  in  the  same  neighborhood 

A  first  approach  supported  by  the  information  geometric 
techniques  can  be  based  on  the  projection  of  the  distribu¬ 
tions,  which  fit  the  data  sets,  to  the  same  tangent  space. 
Then,  their  distance  is  calculated  based  on  the  corre¬ 
sponding  inner  product.  For  example,  the  Weibull  distri¬ 
bution  followed  by  the  satellite  data  obtained  in  the 
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Fig.  21  The  evolution  of 
skewness  for  WAM  modeled 
and  satellite  recorded  significant 
wave  height  in  the  restricted 
region  through  the  whole  study 
period 
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WAM  Skewness  —  CBS  Skewness 


Fig.  22  The  evolution  of 
kurtosis  for  WAM  modeled  and 
satellite  recorded  significant 
wave  height  in  the  restricted 
region  through  the  whole  study 
period 


Table  9  Weibull  parameters  for  satellite  data  in  the  restricted  area  per  month 


Weibull  parameters 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

a 

3.70 

3.06 

3.16 

3.48 

3.53 

3.81 

3.53 

3.43 

2.74 

4.01 

2.69 

3.49 

p 

4.05 

3.00 

3.89 

2.59 

1.61 

1.66 

1.88 

2.30 

2.30 

2.82 

3.05 

3.57 

restricted  area  of  Northwestern  European  coastline  (Sect. 
3.2)  during  August  2008  has  shape  parameter  a  =  3.43 
and  scale  /?  =  2.30  m  (see  Tables  9,  11).  The  corre¬ 
sponding  values  for  WAM  modeled  significant  wave 
height  are  a  =  2.82  and  /?  =  2.35  m.  Therefore,  the 
observed  and  modeled  data  can  be  considered  as  elements 


u0  =  W(3.43,  2.30),  u1  =  W(2.82,  2.35)  of  the  statistical 
manifold  S  of  all  Weibull  distributions  being  projected  to 
the  same  tangent  space.  The  latter  can  be  chosen  to  be  the 
tangent  space  TUoS  of  u0  where  the  inner  product,  and 
hence  the  distances,  is  defined  by  the  Fisher  information 
matrix  at  u0\ 
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Table  10  Weibull  parameters  for  satellite  data  in  the  restricted  area 
for  the  whole  study  period,  the  summer  and  winter  months 


Weibull 

parameters 

Summer 

Winter 

Overall 

a 

3.39 

3.42 

3.35 

p 

2.73 

2.06 

3.40 

Table  12  Weibull  parameters  for  WAM  outputs  in  the  restricted  area 
for  the  whole  study  period,  the  summer  and  winter  months 


Weibull 

parameters 

Summer 

Winter 

Overall 

a 

3.01 

3.10 

2.92 

p 

3.03 

2.29 

3.78 

G  = 


’3.43)2(2.30)2 
2.30(1  -  y) 


case: 


2.30(1  -y)' 

'62.23 

0.97" 

6(y-lf+n2 

— 

0.97 

0.16 

5 

6(3.43)2 

(15) 

ft)j  (t)co'2(t) 

between  u0  and  u2 

would  be  in  this 

<4  (0-4 

77z  i 

7l2f 


(  CO 


M)2+- 


n(y*-2y  +  i  +l) 


6(i-y)/?(y2-27  +  f+i) 

^3  KW)  -0, 

^K(0)2+^Sf^“i(0«4W 


n2f 


d{uQ ,  Mi)  =  y  (u0  -  uf)T  G(u0  -  Mi)  (16) 

which  should  replace  the  classical  yj [u0  —  u\)T (u0  —  u\) 
used  by  least  square  methods  in  assimilation  or  other 
optimizations  procedures. 

In  a  similar  way  one  may  also  estimate  the  distance 
between  any  elements  of  the  same  tangent  space.  The 
novelty  compared  to  the  classical  least  square  methods  is 
the  use  of  the  Fisher  information  matrix  instead  of  the 
identity ,  incorporating  in  this  way  the  geometrical  structure 
of  the  manifold  of  distributions. 

The  present  approach  simplifies  the  estimation  of  the 
distance  since  there  is  no  need  of  solving  complicated 
systems  of  differential  equations  as  those  corresponding  to 
geodesics  (relation  10).  However,  an  approximation  error 
should  be  expected. 


K«)2=  o 


(17) 


In  most  of  the  cases,  this  cannot  be  solved  analytically  and 
the  use  of  approximation  methods  is  necessary. 

A  relevant  example  is  presented  here.  The  Weibull 
distribution  that  fits  to  the  satellite  data  obtained  in  the 
restricted  area  of  Northwestern  European  coastline  during 
August  2008  are  used  again.  Therefore,  the  probability 
density  function  of  the  satellite  records  has  shape  param¬ 
eter  a  =  3.43  and  scale  f  =  2.30  m,  while  for  the  relevant 
WAM  outputs  a  =  2.82  and  ft  =  2.35  m.  The  minimum 
length  curve  that  gives  the  distance  between  the  two  dis¬ 
tributions  is  a  two  dimensional  curve  co  =  (co  1,(02)  that  can 
be  obtained  as  the  solution  of  the  differential  system: 

co"  -  O.S2(co[)2+0.65co[co'2  -  0.02(c4)2=  0 


4.2.2  Using  geodesics 

The  full  exploitation  of  the  information  geometric  frame¬ 
work  proceeds  by  the  use  of  geodesic  curves  co  = 
(coi ,  oof)  :  M  — >  S  for  the  estimation  of  the  distances  on  a 
statistical  manifold  S.  This  results  to  a  system  of  second 
order  differential  equations  (Eq.  10).  By  substituting  the 
values  of  the  Christoffel  Vjk  (Spivak  1965,  1979;  Dodson 
and  Poston  1991)  obtained  for  the  Weibull  statistical 
manifold  (Eq.  14),  the  system  becomes: 


m"  -  0.77  (co[)2 +0.77  co[co2  -  0.32(co^)2=  0 
under  the  conditions 

oi(0)  =  3.43,  o)2  ( 0)  =  2.30,  co  i(l)  =  2.82, 

0)2  ( 1)  =  2.35 

By  numerically  solving  this  nonlinear  system,  one  reaches 
the  solution  presented  in  Fig.  25.  The  graphical  represen¬ 
tations  of  the  geodesic  are  far  from  being  linear  which 
should  be  the  case  if  the  classical  (linear  regression)  sta¬ 
tistical  approach  has  been  adopted.  In  the  same  figure,  the 


Table  11  Weibull  parameters  for  WAM  outputs  in  the  restricted  area  per  month 


Weibull 

parameters 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Jul 

Aug 

Sep 

Oct 

Nov 

Dec 

a 

3.34 

3.08 

2.70 

2.78 

3.64 

3.52 

3.17 

2.82 

2.67 

2.97 

2.53 

2.88 

p 

4.50 

3.48 

4.43 

2.84 

1.84 

1.92 

2.22 

2.35 

2.54 

3.06 

3.25 

3.98 
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Fig.  23  The  shape  parameter  a 
of  the  Weibull  distributions  that 
fit  to  WAM  modeled  and 
satellite  recorded  significant 
wave  height  in  the  restricted 
region  through  all  months  of 
2008 


Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
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Fig.  24  The  scale  parameter 
(in  meters)  of  the  Weibull 
distributions  that  fit  to  WAM 
modeled  and  satellite  recorded 
significant  wave  height  in  the 
restricted  region  through  all 
months  of  2008 


Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
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spray  of  other  geodesics  emanating  from  the  same  initial 
point  (3.43,  2.30)  is  also  presented. 

An  attempt  to  visualize  further  the  above  approach  is 
made  in  Fig.  26  a  and  b  where  the  statistical  manifolds 
formed  by  the  satellite  records  and  WAM  outputs  (monthly 
values)  are  presented  as  elements  of  the  non-Euclidean 
space  that  the  totality  of  Weibull  distributions  define. 

5  Conclusions 

The  results  of  the  numerical  wave  prediction  model  WAM 
for  an  area  of  increased  interest  (the  north  Atlantic  ocean) 
concerning  the  significant  wave  height  over  a  period  of 
1  year  were  evaluated  against  corresponding  satellite 


measurements.  Special  attention  was  given  to  the  proba¬ 
bility  distribution  functions  formed.  The  outcomes  were 
utilized  in  order  to  discuss  novel  statistical  procedures  for 
the  quantification  of  the  bias,  based  on  a  relatively  new 
branch  of  mathematics,  information  geometry,  which  has 
not  been  exploited  so  far  in  atmospheric  sciences  and 
oceanography.  The  most  important  conclusions  made 
follow: 

•  Similar  but  not  identical  two-parameter  Weibull  distri¬ 
butions  seem  to  fit  to  the  observational  and  modeled 
significant  wave  height  values.  In  particular,  the  shape 
parameter  values  both  for  satellite  records  and  WAM 
outputs  increase  as  moving  to  offshore  areas.  The 
maximum  values  emerge  at  the  sea  area  southern  of 
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2.8  3.0  3.2  3.4  3.d  3-8  4.0 

Fig.  25  a  The  graphical  representation  of  the  geodesic  ( curved  line ) 
that  gives  the  minimum  length  curve  connecting  the  satellite 
observations  with  WAM  outputs  for  August  2008.  The  straight  line 
corresponds  to  the  Euclidean  (classical)  geodesic,  b  The  graphical 
representation  of  a  numerical  solution  spray  of  geodesics  emanating 
from  (3.43,2.30)  including  the  one  to  (2.82,  2.35)  that  gives  the 
minimum  length  curve  connecting  the  satellite  observations  with 
WAM  outputs  for  August  2008.  (Color  figure  online) 


Iceland.  On  the  other  hand,  increased  scale  parameters 
for  both  observations  and  model  outputs  in  the  western 
coast  of  central  Africa  can  be  attributed  to  non  uniform 
distribution  of  the  sea  state  in  this  area. 

•  The  estimated  shape  parameters  for  WAM  outputs 
outmatch  those  of  satellite  records  in  a  mild  but 


systematic  way  while  the  scale  analogous  values  for  the 
wave  model  outputs,  concerning  the  whole  area  of 
study,  are  slightly  underestimated  indicating  that  the 
satellite  records  form  stretched  out  distributions. 

WAM  seems  slightly  but  consistently  to  overestimate 
the  significant  wave  height  through  the  whole  study 
period.  The  same  holds  also  for  the  variability  of  the 
simulated  values  as  expressed  by  the  standard  deviation 
that  constantly  outmatch  that  of  observations. 

Non  negligible  differences  exist  between  the  ranges  of 
SWH  values  for  WAM  outputs  and  observations.  This 
can  be  attributed  to  WAM  problems  with  swell  decay 
as  well  as  to  the  way  of  calculation  (merging)  of 
satellite  records. 

An  increased  part  of  the  distribution  of  modeled  values, 
compared  to  the  corresponding  observations,  is  con¬ 
centrated  at  relatively  smaller  values.  This  positive 
asymmetry  is  highlighted  by  the  increased  values  of 
skewness. 

The  variability  of  WAM  outputs  is  more  dependent  on 
extreme  values  than  satellite  observations  as  the 
increased  kurtosis  indicates,  especially  during  the 
summer  months. 

The  parameters  of  the  probability  density  functions  that 
fit  the  modeled  and  observational  data  appear  to  have 
significant  spatial  variation.  As  a  result,  the  use  of  the 
same  cost  function  in  optimization  systems  for  the 
whole  domain  of  the  study  is  a  serious  simplification.  In 
this  respect  information  geometry  techniques  provide 
possible  ways  out. 

Two  different  scenarios  for  the  estimation  of  distances 
between  the  data  sets  in  the  study  are  discussed  taking 
into  account  that  the  Weibull  distributions  form  a 
2-dimensional  non-Euclidean  space,  in  particular  a 
Riemannian  manifold,  avoiding  simplifications  that 
classical  statistics  adopt  (use  of  Euclidean  distances): 

•  The  first  approach  utilizes  the  tangent  spaces  at  the 
points  of  interest  avoiding  solving  the  complicated 
differential  systems  that  arise  within  the  informa¬ 
tion  geometric  framework.  An  approximation  error 
is  expected  in  this  case. 

•  In  the  second  scenario  the  proposed  geometric 
methodology  is  fully  exploited  and  the  distances  are 
obtained  based  on  the  geodesic  curves  of  the 
statistical  manifold  that  the  data  in  the  study  form. 

In  both  cases  the  obtained  results  deviate  from  those 
resulted  in  the  classical  case. 

An  example/application  of  the  proposed  techniques  to 
the  northwestern  coastline  of  France  and  Spain  is 
discussed  clarifying  the  alternative  way  for  the  estima¬ 
tion  of  distances  between  observations  and  modeled 
values. 
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Fig.  26  The  statistical 
manifolds  formed  by  the 
monthly  values  of  the  satellite 
records  (a)  and  WAM  outputs 
(b)  as  elements  of  the  non- 
Euclidean  space  of  all  Weibull 
distributions.  A  classical 
“BlueGreenYellow”  color 
palette  has  been  used  depending 
on  their  approximate  divergence 
from  annual  averages. 

(Color  figure  online) 
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